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Abstract 

As disk arrays become widely used, tools for understanding and analyzing their performance 
become increasingly important. In particular, performance models can be invaluable in both 
configuring and designing disk arrays. Accurate analytic performance models are desirable over 
other types of models because they can be quickly evaluated, are applicable under a wide range 
of system and workload parameters, and can be manipulated by a range of mathematical tech- 
niques. Unfortunately, analytic performance models of disk arrays are difficult to formulate due 
to the presence of queuing and fork-join synchronization ; a disk array request is broken up into 
independent disk requests which must all complete to satisfy the original request. In this paper, 
we develop, validate and apply an analytic performance model for disk arrays. We derive simple 
equations for approximating their utilization, response time and throughput. We then validate 
the analytic model via simulation and investigate the accuracy of each approximation used in 
deriving the analytic model. Finally, we apply the analytic model to derive an equation for the 
optimal unit of data striping in disk arrays. 


1 Introduction 

In recent years, improvements in microprocessor performance has greatly outpaced improvements 
in I/O performance. If the trend continues, future improvements in microprocessor performance 
will be wasted as computer systems become increasingly I/O bound. To overcome the impending 
I/O crisis, several researchers [7,8,10,12,13,15] have proposed the use of disk arrays that stripe data 
across multiple disks and provide improved I/O performance by using parallelism to increase data 
transfer rates and by servicing multiple I/O requests concurrently. 

Given the important role disk arrays will play in the I/O systems of tomorrow, tools for un- 
derstanding their performance become increasingly important. In particular, performance models, 
combined with a thorough understanding of an installation’s workload, will be invaluable in both 
configuring and designing disk arrays. In general, accurate analytic performance models are de- 
sirable over other types of models, such as empirical and simulation, because they can be quickly 
evaluated, are applicable under a wide range of system and workload parameters, and can be ma- 
nipulated by a range of mathematical techniques. Even when analytic models are not directly 
applicable to a particular system or workload, they are frequently useful for quickly analyzing 
general properties of the system, stimulating intuition and furthering understanding. 

Unfortunately, analytic performance models of disk arrays are difficult to formulate due to the 
presence of queuing and fork-join synchronization; a disk array request is broken up into inde- 
pendent disk requests which must all complete to satisfy the disk array request. Exact analytic 
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solutions for the two server fork-join queue given Poisson arrivals and independent service times 
currently exist [1,5] but the fc-server fork-join queue remains unsolved. Other related work in the 
field falls into four primary categories: (1) simulation studies, (2) analytic models that ignore queue- 
ing effects, (3) analytic models that ignore fork-join synchronization and (4) restricted queueing 
models that deal with fork-join synchronization using specialized techniques not easily extended to 
modeling disk arrays. Most analytic queueing studies deal with general queueing systems rather 
than disk arrays in particular. The following lists previous work that is representative of the field. 

• Kim [8] investigates the performance of n independent disks without data striping versus n 
synchronized disks with data striping; the n disks are essentially equivalent to a single disk 
with n times higher data transfer rate. She derives equations for response time assuming each 
disk is an M/G/l system. Because the disks are completely synchronized, she avoids fork-join 
synchronization altogether. 

• Livny [10] investigates the performance of declustering , where data is striped in 26KB units, 
versus clustering , where data is not striped, over a range of transaction workloads via simula- 
tion. 

• Reddy [14] investigates the performance tradeoff between synchronized fine-grained data strip- 
ing versus asynchronous coarse-grained data striping via simulation. He also proposes and 
investigates hybrid schemes that combine aspects of synchronized fine-grained data striping 
and asynchronous coarse-grained data striping. 

• Chen [4] derives empirical rules for optimally selecting the unit of data striping in disk arrays 
over a range of workloads via simulation. 

• Salem and Garcia-Molina [15]; Kim and Tantawi [9]; Bitton and Gray [2] derive minimum 
response time formulas (no queueing) for asynchronous disk arrays. 

• Patterson, Gibson and Katz [13] derive analytic formulas for maximum throughput in RAID’s 
(Redundant Array of Inexpensive Disks) which are subsequently verified by Chen [3] via 
measurement. 

• Heidelberger and Trivedi [6] formulates an analytic model for systems with forks but no joins. 

In this paper, we develop, validate and apply an analytic performance model for disk arrays. Our 
model is different from previous analytic models of disk arrays mentioned above for the following 
reasons. First, we use a closed queueing model with a fixed number of processes whereas previous 
analytic models of disk arrays have used open queueing models with Poisson arrivals. A closed model 
more accurately models the synchronous I/O behavior of scientific, time-sharing and distributed 
systems. In such systems, processes tends to wait for previous I/O requests to complete before 
issuing new I/O requests, whereas in transaction based systems, I/O requests are issued at random 
points in time regardless of whether the previous I/O requests have completed. Second, to the best 
of our knowledge, this is the first analytic model for disk arrays that handles both the queueing at 
individual disks and the fork-join synchronization introduced by data striping. Previous analytic 
models that handle both queueing and fork-join synchronization cannot easily be applied to disk 
arrays because they assume service times across servers (disks) are independent whereas in disk 
arrays, they are very much dependent. 
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Figure 1: Data Striping. 


In the following sections, we first derive an exact expression for the utilization of the model 
system. Because the exact expression contains parameters that are difficult or impossible to com- 
pute, we either analytically approximate or empirically calibrate the difficult parameters to make 
the expression more tractable. From the resulting approximate equation for utilization, we derive 
equations for response time and throughput. We then validate the analytic model via simulation 
and investigate the accuracy and sensitivity of each approximation used in deriving the analytic 
model. Finally, we apply the analytic model to derive an equation for determining the optimal unit 
of data striping in disk arrays. 


2 Definitions 

Disk arrays provide high I/O performance by striping data over multiple disks. High performance 
is achieved by servicing multiple I/O requests concurrently and by using more than one disk to 
service a single request in a parallel manner. 

Figure 1 illustrates the basic disk array of interest and illustrates the terms stripe unit and data 
stripe which we formally define as follows: 

Stripe unit is the unit of data interleaving, that is, the amount of data that is placed on a disk 
before data is placed on the next disk. Stripe units typically range from a sector to a track in 
size (512 bytes to 64 kilobytes). Figure 1 illustrates a disk array with five disks with the first 
ten stripe units labeled. 

Data stripe is a sequence of logically consecutive stripe units. A logical I/O request to a disk 
array corresponds to a data stripe. Figure 1 illustrates a data stripe consisting of four stripe 
units spanning stripe units three through six. 


3 The Analytic Model 

In this section, we derive equations to approximate the performance of disk arrays. Our approach 
is to derive the expected utilization of a given disk in the disk array. Because we are modeling 
a closed system where each disk plays a symmetric role with respect to each other, knowing the 
expected utilization of a given disk in the system will allow us to compute the system’s throughput 
and response time. 
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3.1 The Model System 

Consider the closed queueing system illustrated by Figure 2. The system consists of L processes, 
each of which issues, one at a time, an array request of size n stripe units. Each array request 
is broken up into n disk requests and the disk requests are queued round-robin starting from a 
randomly chosen disk. Each disk services a single disk request at a time in a FIFO manner. When 
all of the disk requests corresponding to an array request are serviced, the process that issued the 
array request issues another array request, repeating the cycle. Note that two array requests may 
partially overlap on some of the disks, resulting in complex interactions. We sometimes refer to 
array requests simply as requests. The parameters of the above system are as follows: 

L = Number of processes issuing requests. 

N = Number of disks. 

n = Request size (number of disks/stripe- units accessed per request); 
n < N. 

S = Service time of a given disk request. 

In the derivation of the analytic model, we will assume that L and n are fixed. We will also assume 
that the processes do nothing but issue I/O requests. 
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Figure 3: Time-line of Events at a Given Disk. After the disk request finishes service at time t 2 , 
M — 2 array request that do not access the given disk are issued at times t$ and t± before an array 
request that accesses the given disk is issued at time <5. The disk remains idle for a time period of 
W = r 0 + n + r 2 . 


3.2 The Expected Utilization 

In derivating the expected utilization of the model system, the following definitions will prove 
useful: 

U = Expected utilization of a given disk. 

R = Response time of a given array request. 

W = Disk idle (wait) time between disk request servicings. 

Q = Queue length at a given disk. 

Po = Probability that the queue at a given disk is empty 
when the disk finishes servicing a disk request. 
p = Probability that a request will access a given disk; 
n/N. 

If we visualize the activity at a given disk as an alternating sequence of busy periods of length 
S and idle periods of length VF, the expected utilization of a given disk is, 


v= E W .. . 

E(S) + E(W) 


( 1 ) 


Idle periods of length zero can occur and imply that another disk request is already waiting for 
service, Q > 0, when the current disk request finishes service. 

Let tq denote the time between the end of service of a given disk request and the issuing of 
a new array request into the system. Let r,-,i € {1,2,...} denote the successive time intervals 
between successive issues of array requests numbered relative to tq. Let M denote the number of 
array requests that are issued after a given disk finishes a disk request until, but excluding, the 
array request that accesses the given disk. Since each array request has probability p of accessing 
a given disk, M is geometrically distributed and E(M ) = 1/p - 1 . Figure 3 illustrates the above 
terms. 
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By conditioning on the queue length at the time a disk request finishes service, we can write, 

E(W) = P(Q > 0)E(W\Q > 0) + P(Q = 0)E{W\Q = 0), 

E(W) = (l-po)0 + poE(£iio 
E(W) = p 0 (E(r 0 ) + E(Ziii ri))- 

Substituting into Equation 1 we have, 


u= m 

B(5) + po(£(ro) + -B(E^ 1 r,)) 

Equation 2 is an exact equation for the expected utilization of the model system. 


3.3 Approximating the Expected Utilization 

In the previous section, we formulated an exact equation, Equation 2, for the expected utilization 
of the model system. Unfortunately, the exact equation consists of terms which are very difficult if 
not impossible to compute. In this section, we approximate components of Equation 2 to make it 
analytically tractable. 

To simplify Equation 2, we make the following assumption: E(YaL i T i) — E(M)E(r ) = 
(1/p — 1 )E(R)/L. From Little’s Law, we know that the average time between successive issues 
of array requests is E(R)/L ; thus, the above approximation would be exact if r,-, i € {1,2,.. .} were 
independently distributed with a common mean of E(R)/L. For the moment, we will take the 
above approximation as given, but later show via simulation that the above is an extremely good 
approximation. Thus, we can write, 

rr £(S ) 

E(S) + po(E(ro) + ( 1/p - 1 )E(R)/LY ( 1 

Given the above approximation, it is natural to assume the following restriction on E(ro) solely 
for the purpose of providing an intuitive feel for the range of vq\ 

0 < E(r 0 ) < E(r). ( 4 ) 

The first inequality must hold since ro > 0 whereas the second is just an intuitive, almost arbitrary, 
restriction. The following observations concerning E(ro) are evident: 

• E(r 0 ) = 0 implies that disk requests associated with the same array request finish at the same 
time and thus an array request is issued immediately whenever any disk request finishes. 

• -E'( 7 *o) = 0 when n = 1, that is, when each array request consists of a single disk request. In this 
case, the completion of each disk request corresponds to the completion of the corresponding 
array request and, thus, the process that issued the disk request will immediately issue another 
array request. 

• E(r 0 ) ^ 0 when n — N, that is, when an array request always uses all the disks. In this 
case, disk requests associated with the same array request will tend to finish at close to the 
same time because all of the disks will be in very similar states and operate in a lock step 
fashion since disk service times are deterministic and disk requests across disks will be almost 
identical. 
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We will find it convenient to express E(r 0 ) as a multiple of E(R)/L ; thus, we introduce the pa- 
rameter 7 as E(ro) = 7 E(R)/L, where Restriction 4 implies 0 < 7 < 1. Later, we will empirical 
calibrate 7. For now, we know that 7 = 0 when n — 1 and 7 ~ 0 when n = N. Rewriting 
Equation 3 in terms of 7 we have, 


U ~ 


E(S) 

£ (S) +E oMRl ((1 /p _i ) + 7 )- 


(5) 


The following is the key approximation: 


p 0 E(R)/E(S) = 1. 


( 6 ) 


The above equation is true for M/M/l systems but is unlikely to be completely accurate for the 
model system. We will later examine, via simulation, the accuracy, sensitivity and error introduced 
by this approximation. We can now rewrite Equation 5 as, 


1 + i( 1 /i>-i + 7)' 


(7) 


Note that under the approximations we have made, the expected utilization is insensitive to the disk 
service time distribution , S. 

Since this is a closed system, the expected response time can be directly calculated from the 
expected utilization: 

E(S)Ln 
UN * 

The expected throughput in megabytes per second can be written as, 



MBS = 


UNSU 
E(S) ’ 


(9) 


where SU is the size of the stripe unit. Future references to a specific analytic model will refer to 
the above equations and to Equation 7 in particular. 


3.4 Summary 


In this section we have derived a simple analytic model for disk arrays, U ~ 
upon two approximations: 


1 

i+i(i/p-i+7)’ 


based 


• E(Y^ 1 r i ) = (l/p-l)E(R)/L, 

• poE(R)/E(S)=l. 


With regard to the first approximation, we will show that it is very accurate and introduces very 
small errors. The second approximation is more difficult to justify. While it is not an accurate 
approximation for certain workloads, the error introduced into the analytic model is insensitive 
to the accuracy of the approximation under those same workloads. The approximation introduces 
errors on the order of ±10%. 

The model also contains an undefined parameter 7, a complex function of the model system’s 
parameters. We will empirically calibrate the value of 7 to a constant and show that this introduces 
only small errors to the analytic model. 


7 



4 Validation of the Analytic Model 

In this section, we calibrate and validate the analytic model developed in the previous section 
via simulation. We show that the parameter 7 can be calibrated to a constant. The resulting 
analytic model closely approximates the simulation results over the range of system and workload 
parameters investigated. 

4.1 The Disk Model 

The disk model is based upon the IBM 0661 3.5 inch 320 MB SCSI disk drive. Figure 4 tabulates 
the parameters and plots the seek profile of the simulated disk. 

4.2 Simulation Parameters 

The simulation parameters of interest are as follows: 

• Input Variables. 

N Number of disks in array. 

SU Size of the stripe unit. 

L Load, that is, the number of processes generating array requests. 

SZ Array request size. 

S Disk service time distribution (implicit in the disk model). 

• Output Variable. 

U Utilization. (Note that for the model system of interest, throughput is proportional to U 
and response time is inversely proportional to U.) 

Recall that, 

n = Request size (number of disks/stripe-units accessed per request); 

SZ/SU. 

p ~ Probability that a request will access a given disk; 
n/N. 

4.3 Simulation Results 

Figure 5 plots the utilization from a representative simulation run versus the utilization predicted 
by Equation 7 for a disk array consisting of 17 disks and a stripe unit size of 32KB for four values 
of 7 G {0, l,p(l — p), 0.15}. As previously mentioned, 7 ~ 0 when n E {l,jV}, that is, when 
SZ E {SU, N SU}; thus, we first try 7 = 0. As expected, Equation 7 with 7 = 0 models the 
utilization of the system fairly well when n E {1,-V}; however, the analytic model with 7 = 0 
overestimates the utilization at other values of n. This is because 7 = 0 underestimates E(r 0 ) 
when n is between 1 and N . To get an idea for the sensitivity of the analytic model to 7, Figure 5 
also plots the utilization of the simulation run versus the utilization predicted by Equation 7 over 
the same range of input parameters with 7 = 1. The resulting analytic model is highly inaccurate. 
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cylinders per disk 

949 

tracks per cylinder 

14 

sectors per track 

48 

bytes per sector 

512 

track skew in sectors 

4 

revolution time 

13.9 ms 

single cylinder seek time 

2.0 ms 

average seek time 

12.5 ms 

max stroke seek time 

25.0 ms 

max sustained transfer rate 

1.7 MB/s 


Seek Time Versus Seek Distance 



seek distance in cylinders 


Figure 4: Disk Characteristics. The graph plots the seek time in milliseconds versus the seek 
distance in cylinders. The curve is derived from the following formula: 

seekTime — / if x — 0 

seeklime ~ j fl(a . _ 1)0 .5 + 6(a . _ + c> if x > 0 

where x is the seek distance in cylinders and a, b and c are constants chosen to satisfy the single 
cylinder, max stroke and average seek times. For the simulated disk, a = 0.4623,6 = 0.0092 and 
c — 2. 
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Analytic vs. Empirical U (c=0) 


Analytic vs. Empirical U (c=l) 
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Figure 5: Analytic vs. Empirical Models, c = 7; N = 17; SU - 32 1 (B; L £ {1,2,4,8,16,32}; 
SZ £ {32 Ji 5, 64/if .B, . . ,,544 /CjB — N X SU}. Each pair of lines is labeled with its corresponding 
value of L. Each graph illustrates the accuracy of the analytic model for the given value of 7 which 
is denoted by c in the title of each graph. 10 






The best result is achieved when 7 = p(l — p). This will make 7 ~ 0 at the boundaries when 
n £ {1, N } and somewhat positive elsewhere. In this case, 0.25 is the maximum value for 7 reached 
at p = 0.5. The resulting correspondence between the analytic and simulated utilization is very 
good. Unfortunately, using 7 = p(l— p) introduces higher order dependencies with respect to p into 
Equation 7. Consequently, we will try 7 = 0.15 to see if we can improve over our original choice of 
7 = 0 without introducing higher order dependencies. The resulting correspondence between the 
analytic and simulated utilization, while not as good as 7 = p( 1 — p), is still good. Henceforth, 
we will assume that 7 can be accurately modeled as a constant and where a specific value for 7 is 
needed, we will assume that 7 = 0.15. 

5 Error Analysis 

In the derivation of the analytic model, we have made the following approximations: 

. p 0 E(R)/E(S)~l, 

• E(YZLir,)~(lj v -l)E(R)IL, 

• 7 ~ 0.15. 

A previous section has already shown that the above approximations result in an accurate analytic 
model over a range of system and workload parameters. In this section, we examine the accuracy, 
sensitivity and the error introduced by each approximation. Our methodology is to rewrite the 
exact equation for utilization, Equation 2, in terms of variables o,/3 and 7. If we could calculate 
the values of these variables exactly, we would have an exact analytic model. We show that the 
approximate analytic model, Equation 7, can be derived by substituting specific estimates for the 
true values of the variables a, (3 and 7. Thus, we can study how the approximations affect the 
error in the analytic model by determining how inaccuracies in the estimated values for a,/? and 7 
contribute to the error in the analytic model. 

5.1 Exact and Approximate Models 

Let 


« s PoE(R)/E(S), 

M 

0 = E(Y,Ti)L/E(R ), 

1=1 

7 s E(r 0 )L/E(R). 


Note that (3 ~ E(M ), the average number of array requests that are issued until an array request 
that accesses a given disk is issued. Rewriting Equation 2 in terms of cx, (3 and 7 we have, 


1 + r(^ + 7)* 


( 10 ) 


Note that Equation 10 is an exact equation for the expected utilization of the model system and 
does not utilize any of the approximations used in deriving the analytic model. 
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Let 


a = 

P = 

7 = 

The above definitions for a, {3 and 7, estimating the true values a, (3 and 7 respectively, directly 
correspond to the three approximations we have made in deriving the analytic model. Substituting 
d,/3 and 7 for a, (3 and 7 into Equation 10 results in the approximate analytic model given by 
Equation 7 and denoted below as U : 


1, 

1 Ip - 1, 

0.15. 


i+i(/?+7)' 


( 11 ) 


The primary question of interest in the following section is how errors in the estimators d,/3 and 7 
affect the error of the analytic model, U . 


5.2 Propagation of Error 


The previous section has shown that the approximate equation for utilization, Equation 7, can be 
viewed as derived from the exact equation for utilization, Equation 2, by estimating the variables 
a,/3 and 7. This section looks at how inaccuracies in the estimates of a, (3 and 7 affect the error of 
the analytic model. 

We know that, 


Jrr dU J dU dU , 

dU= d^ dQ+ dp d0+ -^ dl - 

The above equation shows how small changes in a, (3 and 7 affects U . Analogously, 


( 12 ) 


£T y dU CA dU Qh du 

6U ~ - 6a -| 63 + — — 6y. 

da dfl dj 


(13) 


The above equation shows how small inaccuracies in a,j3 and 7 affect the error in U. For example, 
the first term of Equation 13 shows how small inaccuracies in a affects the accuracy of tf\ unfor- 
tunately, the first term of Equation 13 also depends on U which depends on ft and 7. This means 
that we may incorrectly calculate the error contributed by a due to inaccuracies in the other two 
variables. 

Because of the above drawback to using Equation 13 directly, we will instead use the following 
equations as the basis of error analysis. 


U Q = 
Up = 
U-t = 


1 

(14) 

! + !(/? + 7)’ 

1 

(15) 

i + SOa + 7)’ 

1 

l + l(/3 + 7 )' 

(16) 


The main advantage to using the these equations rather than Equation 11 is that errors in Z7 a , Up 
and £7 7 can be directly attributed to the inaccuracy in d, /? and 7 respectively. Table 1 formally 
defines the terms error, sensitivity, accuracy, relative error and relative accuracy. 
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Estimator 

Error 

Sensitivity 

Accuracy 

Rel Err 

Rel Acc 

& 

u a -u 

dJJ 

da 

a — a 

u a -u 

IJ 

a— a 

u 

$ 

Ufi-U 

8U 

80 

p-(3 

Up-U 

u 

tzA 

u 

1 

%-u 

8U 

da 

1 ~1 

U-y-U 

u 

7-7 

u 


Table 1: Definition of Error, Sensitivity and Accuracy. The above definitions use the true values 
U,a,(3 and 7 which are unknown. We will use the simulated values for XJ,a,(3 and 7 whenever the 
true values are required for computations. Although the simulated values will never equal the true 
values, the simulated values can be made to approximate the true values arbitrarily closely. For 
comparison purposes, we will find it convenient to use relative error and relative accuracy rather 
than error and accuracy directly. Note that just as the error is approximately equal to the accuracy 
times the sensitivity, the relative error is approximately equal to the relative accuracy times the 
sensitivity. 

5.3 Simulation Results 

Figure 6 plots the simulated and estimated U , relative error, relative accuracy and sensitivity 
corresponding to each of the estimated parameters d,/3 and 7. The first row of graphs in Figure 6 
illustrates the overall relative error in the analytic model. This is roughly equal to the sum of the 
relative errors due to d,/3 and 7 illustrated in the succeeding rows. Note that the overall relative 
error of the analytic model is generally smaller than ±5%. Before discussing the relative error, 
relative accuracy and sensitivity of a, (3 and 7 individually, we make several general comments 
concerning all three variables. First, the relative errors due to a, (3 and 7 rarely exceeds ±10%. 
Second, the relative inaccuracy rarely exceeds ±1. Third, for the simulated parameters, the absolute 
value of the sensitivity of all three variables is always less than one and is small in general. This 
implies that the model is fairly robust and is insensitive to inaccuracies in the approximations used 
to derive the model. Fourth, when the relative inaccuracy is high, the sensitivity tends to be low, 
resulting in a small relative error. Fifth, the sensitivity of all three variables tends to decrease as 
L increases. 

The second row of Figure 6 illustrates the relative error, relative accuracy and sensitivity of 
a and corresponds to the approximation poE(R)/ E(S) ~ 1 . Simulation shows that this is a. 
good approximation for small request sizes but is inaccurate for large request sizes; as request 
sizes become large, p 0 E(R)/ E(S) approaches zero. Fortunately, the sensitivity also decreases with 
increasing request size resulting in small errors. Note from Figure 6 that at the smallest request 
size, the approximation becomes less accurate with increasing load. The absolute value of the 
sensitivity also increases with increasing load but reaches a maximum of approximately 0.35 then 
decreases. This leads us to believe that the analytic model will continue to display relatively small 
errors at higher loads. 

The third row illustrates (3 and corresponds to the approximation E(YaLi r *) — (1 /p-l)E(R)/ L. 
As previously stated, we believe this to be a very good approximation which evidence now confirms. 
In the graph plotting U versus U a , the two sets of lines are almost indistinguishable. The relative 
error is generally less than ±1%. 

Finally, the fourth row corresponds to the approximation 7 ~ 0.15. In addition to the general 
comments already made, we note that the error introduce by 7 tends to cancel out the error 
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Figure 6: Error, Accuracy and Sensitivity. 







introduced by a. This is not surprising given that 7 was empirically calibrated to reduce the 
overall error in the analytic model. 

5.4 Summary 

In this section we examined the error introduced by the approximations made in deriving the 
analytic model. We examined the accuracy and sensitivity of each approximation. While some 
of the approximations are grossly inaccurate for certain workloads, this does not introduce large 
errors because the model is insensitive to the approximations at such workloads. Finally, because 
the model is generally insensitive to inaccuracies in the approximations, it is reasonably robust. 

6 The Optimal Stripe Unit Size 

In this section, we will use the analytic model to derive an equation for the optimal stripe unit 
size , the stripe unit size that maximizes throughput in megabytes per second. The equation for 
the optimal stripe unit size is useful as a rule of thumb in configuring disk arrays and also provides 
valuable insights into the factors that influence the optimal stripe unit size. Given today’s disk 
technology, the optimal stripe unit equation is most useful for workloads consisting of I/O requests 
that are a couple of hundred or more kilobytes in size. Miller [11] has shown that such workloads 
are typical of scientific applications. For such workloads, we have found that there is typically a 
10-20% degradation in performance when the stripe unit is a factor of two smaller or larger than 
the optimal size. 

In addition to deriving the equation for the optimal stripe unit size, we will show that the stripe 
unit size that maximizes throughput also minimizes response time. Note, however, that maximizing 
throughput is not the same as maximizing utilization; just because a disk is busy does not mean that 
it is doing useful work. The fundamental tradeoff in selecting a stripe unit size is one of parallelism 
versus concurrency. Small stripe unit sizes increase the parallelism available for servicing a single 
request by mapping a request over a larger number of disks but reduce concurrency because each 
request uses a greater number of disks [4]. 

6.1 Derivation 

We will derive the equation for the optimal stripe unit size from Equation 9. But first, because the 
disk service time, 5, is dependent on the stripe unit size, SU , we must formulate a simple model 
that makes this dependency explicit. Recall the definition for the following disk parameters: 

• P is the average positioning time (seek -f rotational latency). 

• X is the sustained data transfer rate (this is the rate that the disk head reads data off of the 
disk platter). 

Then, 

E(S) = P + SU/X. (17) 

Note that n, the number of stripe units per request can be calculated as follows: 

n = SZ/SU. (18) 
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Substituting equations 7, 17 and 18 into Equation 9 and simplifying we can write the throughput 
in megabytes second as, 


MBS = 


LNXSUSZ 

(. PX + SU)(NSU + SZ(L - 1 + 7))' 


(19) 


Solving for the local maxima in the above equation as a function of SU (7 is assumed to be a 
constant) we get the following equation for the optimal stripe unit size: 


optsu = 

Repeating the above procedure to minimize response time starting from Equation 8 results in the 
same equation for the optimal stripe size; thus, the stripe unit size that maximizes throughput also 
minimizes response time and is given by Equation 20. 

The following remarks can be made about Equation 20: 

• Changes to the system that increase the effective load, that is, an increase in L , an increase 
in SZ, or a decrease in IV, favor larger optimal stripe units. The opposite is true for changes 
that decrease the effective load. 



• In our model system, the optimal stripe unit size is dependent only on the product PX , the 
relative rate at which a disk can position and transfer data, and not on P or X independently. 
If you replace the disks with those that position and transfer data twice as quickly, the optimal 
stripe unit size remains unchanged [4], In this respect, the selection of an optimal stripe unit 
size is a trade-off between the disk positioning time and the data transfer time. 


6.2 Validation 

As a further validation of the analytic model and of Equation 20 in particular, we compare the 
analytic values for the optimal stripe unit size with empirically determined values. Figure 7 plots 
the analytically determined optimal stripe unit sizes versus the empirically determined optimal 
stripe unit sizes on a log-log scale. The shaded regions on the figure represent optimal stripe unit 
sizes that can be ruled out for the following reasons. First, throughput when SU < SZ/N for fixed 
SZ is less than or equal to the throughput when SU = SZ/N. At this stripe unit size, requests 
are being distributed uniformly across all disks and it is not possible to increase parallelism or 
concurrency by reducing the stripe unit size. Second, throughput when SU > SZ for fixed SZ is 
identical to when SU = SZ. In this case, the request already fits completely within a single disk 
and there is no advantage or disadvantage to increasing the stripe unit size. We have empirically 
verified the above two facts. Thus, SZ/N < SU < SZ. For comparison purposes, Figure 8 adds the 
optimal stripe unit sizes predicted by Chen [4]. Note that Chen’s model assumes that the optimal 
stripe unit size is independent of the request size. 

To get a feel for the sensitivity of performance to the choice of stripe unit size, Figure 9 
individually plots each group of lines from Figure 8 with vertical bars to indicate the range of 
stripe unit sizes providing 95% of the throughput of the optimal stripe unit size. Note that there 
is fairly good correlation between the analytically and empirically determined values for optSU. In 
all cases except when L — 1 and request sizes are small, the optimal stripe unit sizes determined 
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Figure 7: Analytic vs. Empirical Optimal Stripe Unit Sizes. N = 16; SU — 32 KB; L G 
{1,2,4,8,16,32}; Each pair of lines is labeled with its corresponding value of L. Ojtsu — Optimal 
stripe unit size in kilobytes. The graph may appear odd at first because the beginning and end of 
the individual lines overlap at the edges of the shaded regions. 
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Figure 8: Comparison with Chen’s Model. N = 16; SU = Z2KB; L 6 {1,2,4,8,16,32}; Each 
pair of lines is labeled with its corresponding value of L. Optsu = Optimal stripe unit size in kilobytes. 
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by both Chen’s model and our model lie within the 95% performance intervals. This is remarkable 
given the different simulation methodologies and criteria used in selecting the optimal stripe unit 
size and indicates that the optimal stripe unit size is a robust property. 

6.3 Summary 

In this section, we derived an equation for the optimal stripe unit size and validated it via simulation. 
The stripe unit size tha t maximizes throughput also minimizes response time and is given by 

optSU — \J . We showed that the optimal stripe unit size is dependent only on the 
relative rates at which a disk can position and transfer data, PX. Our equation for the optimal 
stripe unit size agrees well with Chen’s [4] equation. 

7 Summary and Future Work 

We have derived, validated and applied an analytic performance model for disk arrays. We modeled 
disk arrays as a closed queueing system consisting of a fixed number, X, of processes continuously 
issuing requests of a fixed size, n, to a disk array consisting of N disks. The expected utilization 
of the model system, U, is approximately 15 ^ where p = n/N is the size of the request 

as a fraction of the number of disks in the disk array. We directly derived the expected response 
time and throughput in megabytes per second as - and respectively where E(S) is the 

expected service time of a disk request. We showed via simulation that the utilization predicted 
by the analytic model is generally within ±5% of the simulated values. We examined the error, 
accuracy and sensitivity of each approximation made in the derivation of the analytic model to 
better understand the validity and limits of the model. Finally, we applied the analytic model to 
show that the optimal unit of data striping sim ultaneously maximizes throughput and minimizes 

response time and is equal to \j ' 1 5 ^ SZ where P is the average disk positioning time, X is 
the average disk transfer rate and SZ is the request size. 

There are several major areas for future work with respect to the analytic model presented here. 
First, one can extend the workload model to handle non-constant distributions of request sizes and 
something similar to CPU think time, where the processes, instead of simply issuing I/O requests, 
would alternate between computation and I/O. Second, one can extend the types of disk arrays 
to which the analytic model can be applied. In particular, it would be highly desirable to model 
RAID, Redundant Arrays of Inexpensive Disks [13], systems. Finally, the analytic model can be 
applied to other problems in the design and configuration of disk arrays. 
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